While the rollout of the fifth-generation mobile network (5G) is underway across the globe with the intention to deliver 4K/8K UHD videos, Augmented Reality (AR), and Virtual Reality (VR) content to the mass amounts of users, the coverage and throughput are still one of the most significant issues, especially in the rural areas, where only 5G in the low-frequency band are being deployed. This called for a high-performance adaptive bitrate (ABR) algorithm that can maximize the user quality of experience given 5G network characteristics and data rate of UHD contents. Recently, many of the newly proposed ABR techniques were machine-learning based. Among that, Pensieve is one of the state-of-the-art techniques, which utilized reinforcement-learning to generate an ABR algorithm based on observation of past decision performance. By incorporating the context of the 5G network and UHD content, Pensieve has been optimized into Pensieve 5G. New QoE metrics that more accurately represent the QoE of UHD video streaming on the different types of devices were proposed and used to evaluate Pensieve 5G against other ABR techniques including the original Pensieve. The results from the simulation based on the real 5G Standalone (SA) network throughput shows that Pensieve 5G outperforms both conventional algorithms and Pensieve with the average QoE improvement of 8.8% and 14.2%, respectively. Additionally, Pensieve 5G also performed well on the commercial 5G NR-NR Dual Connectivity (NR-DC) Network, despite the training being done solely using the data from the 5G Standalone (SA) network.
translated by 谷歌翻译
大多数机器视觉任务(例如,语义分割)基于图像编码和解码的图像(例如JPEG)。但是,像素域中的这些解码图像引入了失真,并针对人类的感知进行了优化,从而使机器视觉任务的执行次优。在本文中,我们提出了一种基于压缩域的方法,以改善细分任务。i)提出了一种动态和静态通道选择方法,以减少通过编码获得的压缩表示的冗余。ii)探索和分析了两个不同的变换模块,以帮助将压缩表示形式转换为分割网络中的功能。实验结果表明,与最先进的压缩域的工作相比,我们可以节省多达15.8%的比特率,同时节省约83.6 \%的比特率和44.8%的推理时间,与Pixel-domain-相比基于方法。
translated by 谷歌翻译
无损图像压缩是图像压缩中必不可少的研究领域。最近,与传统的无损方法(例如WebP,JPEG2000和FLIF)相比,基于学习的图像压缩方法具有令人印象深刻的性能。但是,仍然有许多令人印象深刻的有损压缩方法可应用于无损压缩。因此,在本文中,我们探讨了广泛用于有损压缩的方法,并将其应用于无损压缩。受损失压缩显示的高斯混合模型(GMM)的令人印象深刻的性能的启发,我们与GMM生成了无损网络体系结构。除了注意到注意模块和自回归模型的成功成就外,我们建议利用注意模块,并为我们的网络体系结构中的原始图像添加额外的自动回归模型,以提高性能。实验结果表明,我们的方法优于大多数经典的无损压缩方法和现有基于学习的方法。
translated by 谷歌翻译
学习的图像压缩允许达到最新的准确性和压缩比,但是它们相对较慢的运行时性能限制了其使用情况。尽管以前的尝试优化学习的图像编解码器的尝试更多地集中在神经模型和熵编码上,但我们提出了一种改善各种学习图像压缩模型的运行时性能的替代方法。我们介绍了多线程管道和优化的内存模型,以完全利用计算资源来启用GPU和CPU工作负载异步执行。仅我们的架构就已经产生了出色的性能,而没有改变神经模型本身。我们还证明,将架构与以前的调整结合到神经模型可以进一步提高运行时性能。我们表明,与基线相比,我们的实现在吞吐量和延迟中表现出色,并通过创建实时视频流编码器示例应用程序来证明我们的实现的性能,并在嵌入式设备上运行编码器。
translated by 谷歌翻译
Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM. The project page is at https://github.com/ZhengxueCheng/ Learned-Image-Compression-with-GMM-and-Attention.
translated by 谷歌翻译
This paper introduces a new open source platform for end-toend speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and Py-Torch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks.
translated by 谷歌翻译